Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

e 8.19 shows a new RPN chromosome for the RPN chromosome

the upper panel of Figure 8.13. Figure 8.20 shows the resulting

he training of a GP model

ning process is implemented in an iterated process. To start a GP

process, a pool of randomised RPN chromosomes are generated.

ach iteration of a learning process, the first job is to measure the

f each RPN chromosome to a given data set. The fitness

ment is composed of two parts. They are the model accuracy and

h of an RPN chromosome. The former is used to rank RPN

omes in terms of their fitness measurements to a data set. The

sed to award more parsimonious RPN chromosomes.

ose a problem is a regression question. The target variable or the

nt variable of a regression model is denoted by y and the output of

described by an RPN chromosome is denoted by ݕො. The sum of

ed errors between them is defined as below, where N is the total

of the observed target values, ݕ௜ and ݕො௜ stand for the i^th observed

ue and the i^t^h predicted value from a model defined by an RPN

ome,

ߝൌ¹

ܰ^෍ሺݕො

ே

௜^െݕ௜^ሻ^ଶ

௜ୀଵ

(8.5)

ose the length of an RPN chromosome is denoted by ℓ௠. The

easure is defined as below, where m is the m^th model described

N chromosome, 0 ൏ߙ൏1 stands for the trade-off constant, ߝ௠

or the sum of the squared errors measured for the m^th RPN

ome,

ߴ௠ൌߙൈߝ௠൅ሺ1 െߙሻൈℓ௠

(8.6)

wo terms of the above fitness measurements need to be as small

ble. The best model is determined by the following equation,